# Multi-backend support (non-CUDA backends)

> [!Tip]
> If you feel these docs need some additional info, please consider submitting a PR or respectfully request the missing info in one of the below mentioned Github discussion spaces.

As part of a recent refactoring effort, we will soon offer official multi-backend support. Currently, this feature is available in a preview alpha release, allowing us to gather early feedback from users to improve the functionality and identify any bugs.

At present, the Intel CPU and AMD ROCm backends are considered fully functional. The Intel XPU backend has limited functionality and is less mature.

Please refer to the [installation instructions](./installation#multi-backend) for details on installing the backend you intend to test (and hopefully provide feedback on).

> [!Tip]
> Apple Silicon support is planned for Q4 2024. We are actively seeking contributors to help implement this, develop a concrete plan, and create a detailed list of requirements. Due to limited resources, we rely on community contributions for this implementation effort. To discuss further, please spell out your thoughts and discuss in [this GitHub discussion](https://github.com/bitsandbytes-foundation/bitsandbytes/discussions/1340) and tag `@Titus-von-Koeller` and `@matthewdouglas`. Thank you!

## Alpha Release

As we are currently in the alpha testing phase, bugs are expected, and performance might not meet expectations. However, this is exactly what we want to discover from **your** perspective as the end user!

Please share and discuss your feedback with us here:

- [Github Discussion: Multi-backend refactor: Alpha release ( AMD ROCm ONLY )](https://github.com/bitsandbytes-foundation/bitsandbytes/discussions/1339)
- [Github Discussion: Multi-backend refactor: Alpha release ( Intel ONLY )](https://github.com/bitsandbytes-foundation/bitsandbytes/discussions/1338)

Thank you for your support!

## Benchmarks

### Intel

The following performance data is collected from Intel 4th Gen Xeon (SPR) platform. The tables show speed-up and memory compared with different data types of [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf).

#### Inference (CPU)

| Data Type | BF16 | INT8 | NF4 | FP4 |
|---|---|---|---|---|
| Speed-Up (vs BF16) | 1.0x | 0.6x | 2.3x | 0.03x |
| Memory (GB) | 13.1 | 7.6 | 5.0 | 4.6 |

#### Fine-Tuning (CPU)

| Data Type | AMP BF16 | INT8 | NF4 | FP4 |
|---|---|---|---|---|
| Speed-Up (vs AMP BF16) | 1.0x | 0.38x | 0.07x | 0.07x |
| Memory (GB) | 40 | 9 | 6.6 | 6.6 |
